Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds benchmarks for cub::DeviceMerge #3529

Merged
merged 8 commits into from
Feb 1, 2025

Conversation

elstehle
Copy link
Collaborator

Description

Closes #3528

@elstehle elstehle requested a review from a team as a code owner January 26, 2025 19:46
@elstehle elstehle force-pushed the enh/cub-merge-benchmarks branch from 16568b1 to 6bb30c6 Compare January 26, 2025 20:02
Copy link
Contributor

🟩 CI finished in 1h 05m: Pass: 100%/90 | Total: 15h 30m | Avg: 10m 20s | Max: 51m 03s | Hits: 414%/12772
  • 🟩 cub: Pass: 100%/44 | Total: 7h 47m | Avg: 10m 36s | Max: 28m 39s | Hits: 540%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  7h 36m | Avg: 10m 51s | Max: 28m 39s | Hits: 540%/3552  
      🟩 arm64              Pass: 100%/2   | Total: 11m 06s | Avg:  5m 33s | Max:  5m 44s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 48m 22s | Avg:  9m 40s | Max: 25m 55s | Hits: 540%/888   
      🟩 12.5               Pass: 100%/2   | Total: 19m 14s | Avg:  9m 37s | Max:  9m 52s
      🟩 12.6               Pass: 100%/37  | Total:  6h 39m | Avg: 10m 47s | Max: 28m 39s | Hits: 540%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 39s | Avg:  4m 19s | Max:  4m 26s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 48m 22s | Avg:  9m 40s | Max: 25m 55s | Hits: 540%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 14s | Avg:  9m 37s | Max:  9m 52s
      🟩 nvcc12.6           Pass: 100%/35  | Total:  6h 30m | Avg: 11m 10s | Max: 28m 39s | Hits: 540%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 39s | Avg:  4m 19s | Max:  4m 26s
      🟩 nvcc               Pass: 100%/42  | Total:  7h 38m | Avg: 10m 54s | Max: 28m 39s | Hits: 540%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 22m 28s | Avg:  5m 37s | Max:  6m 00s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  5m 42s
      🟩 Clang16            Pass: 100%/2   | Total: 12m 17s | Avg:  6m 08s | Max:  6m 10s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  5m 47s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 15m | Avg: 10m 48s | Max: 25m 23s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 40s | Avg:  5m 50s | Max:  5m 57s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 43s | Avg:  5m 43s | Max:  5m 43s
      🟩 GCC9               Pass: 100%/2   | Total: 12m 30s | Avg:  6m 15s | Max:  6m 26s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 42s | Avg:  5m 51s | Max:  5m 59s
      🟩 GCC11              Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 19s
      🟩 GCC12              Pass: 100%/4   | Total: 48m 37s | Avg: 12m 09s | Max: 19m 17s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 46m | Avg: 13m 19s | Max: 22m 43s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 50m 28s | Avg: 25m 14s | Max: 25m 55s | Hits: 540%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 55m 07s | Avg: 27m 33s | Max: 28m 39s | Hits: 540%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 14s | Avg:  9m 37s | Max:  9m 52s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 13m | Avg:  7m 50s | Max: 25m 23s
      🟩 GCC                Pass: 100%/21  | Total:  3h 29m | Avg:  9m 57s | Max: 22m 43s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 45m | Avg: 26m 23s | Max: 28m 39s | Hits: 540%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 14s | Avg:  9m 37s | Max:  9m 52s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 35m 50s | Avg: 17m 55s | Max: 19m 17s
      🟩 v100               Pass: 100%/42  | Total:  7h 11m | Avg: 10m 16s | Max: 28m 39s | Hits: 540%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 15m | Avg:  8m 31s | Max: 28m 39s | Hits: 540%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 24s | Avg: 20m 24s | Max: 20m 24s
      🟩 GraphCapture       Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 03m | Avg: 21m 06s | Max: 24m 01s
      🟩 TestGPU            Pass: 100%/2   | Total: 48m 06s | Avg: 24m 03s | Max: 25m 23s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 35m 50s | Avg: 17m 55s | Max: 19m 17s
      🟩 90a                Pass: 100%/1   | Total:  4m 58s | Avg:  4m 58s | Max:  4m 58s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 00m | Avg:  9m 00s | Max: 26m 28s | Hits: 540%/2664  
      🟩 20                 Pass: 100%/24  | Total:  4h 46m | Avg: 11m 57s | Max: 28m 39s | Hits: 540%/888   
    
  • 🟩 thrust: Pass: 100%/43 | Total: 6h 42m | Avg: 9m 21s | Max: 32m 01s | Hits: 365%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 23m 49s | Avg: 11m 54s | Max: 17m 20s
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  6h 32m | Avg:  9m 34s | Max: 32m 01s | Hits: 365%/9220  
      🟩 arm64              Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 04s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 45m 31s | Avg:  9m 06s | Max: 25m 10s | Hits: 365%/1844  
      🟩 12.5               Pass: 100%/2   | Total: 28m 50s | Avg: 14m 25s | Max: 14m 43s
      🟩 12.6               Pass: 100%/36  | Total:  5h 27m | Avg:  9m 06s | Max: 32m 01s | Hits: 365%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  5m 22s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 45m 31s | Avg:  9m 06s | Max: 25m 10s | Hits: 365%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 50s | Avg: 14m 25s | Max: 14m 43s
      🟩 nvcc12.6           Pass: 100%/34  | Total:  5h 17m | Avg:  9m 20s | Max: 32m 01s | Hits: 365%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  5m 22s
      🟩 nvcc               Pass: 100%/41  | Total:  6h 31m | Avg:  9m 33s | Max: 32m 01s | Hits: 365%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 27s | Avg:  5m 21s | Max:  5m 44s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 26s | Avg:  5m 43s | Max:  6m 00s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  5m 43s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 36s | Avg:  5m 48s | Max:  5m 58s
      🟩 Clang18            Pass: 100%/7   | Total: 46m 07s | Avg:  6m 35s | Max: 12m 15s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  5m 49s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s
      🟩 GCC9               Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  5m 35s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 18s | Avg:  5m 39s | Max:  5m 42s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  6m 10s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 24s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 12m | Avg:  9m 00s | Max: 17m 26s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 49m 24s | Avg: 24m 42s | Max: 25m 10s | Hits: 365%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 27m | Avg: 29m 05s | Max: 32m 01s | Hits: 365%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 50s | Avg: 14m 25s | Max: 14m 43s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 41m | Avg:  5m 59s | Max: 12m 15s
      🟩 GCC                Pass: 100%/19  | Total:  2h 14m | Avg:  7m 05s | Max: 17m 26s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 16m | Avg: 27m 19s | Max: 32m 01s | Hits: 365%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 50s | Avg: 14m 25s | Max: 14m 43s
    🟩 gpu
      🟩 v100               Pass: 100%/43  | Total:  6h 42m | Avg:  9m 21s | Max: 32m 01s | Hits: 365%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 06m | Avg:  8m 17s | Max: 29m 48s | Hits: 365%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 48m 17s | Avg: 16m 05s | Max: 32m 01s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 47m 01s | Avg: 15m 40s | Max: 17m 26s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 43s | Avg:  4m 43s | Max:  4m 43s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 59m | Avg:  8m 59s | Max: 25m 26s | Hits: 365%/5532  
      🟩 20                 Pass: 100%/21  | Total:  3h 18m | Avg:  9m 27s | Max: 32m 01s | Hits: 365%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 29s | Avg: 5m 14s | Max: 8m 14s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  8m 14s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  8m 14s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  8m 14s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  8m 14s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  8m 14s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  8m 14s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  8m 14s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 15s | Avg:  2m 15s | Max:  2m 15s
      🟩 Test               Pass: 100%/1   | Total:  8m 14s | Avg:  8m 14s | Max:  8m 14s
    
  • 🟩 python: Pass: 100%/1 | Total: 51m 03s | Avg: 51m 03s | Max: 51m 03s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 51m 03s | Avg: 51m 03s | Max: 51m 03s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 51m 03s | Avg: 51m 03s | Max: 51m 03s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 51m 03s | Avg: 51m 03s | Max: 51m 03s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 51m 03s | Avg: 51m 03s | Max: 51m 03s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 51m 03s | Avg: 51m 03s | Max: 51m 03s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 51m 03s | Avg: 51m 03s | Max: 51m 03s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 51m 03s | Avg: 51m 03s | Max: 51m 03s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 51m 03s | Avg: 51m 03s | Max: 51m 03s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@elstehle elstehle enabled auto-merge (squash) January 27, 2025 16:44
Copy link
Contributor

🟨 CI finished in 3h 02m: Pass: 98%/90 | Total: 21h 02m | Avg: 14m 01s | Max: 1h 00m | Hits: 422%/10928
  • 🟨 thrust: Pass: 97%/43 | Total: 8h 46m | Avg: 12m 14s | Max: 33m 13s | Hits: 365%/7376

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/41  | Total:  8h 11m | Avg: 11m 59s | Max: 33m 13s | Hits: 365%/7376  
      🟩 arm64              Pass: 100%/2   | Total: 34m 09s | Avg: 17m 04s | Max: 29m 22s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total: 49m 29s | Avg:  9m 53s | Max: 28m 02s | Hits: 365%/1844  
      🟩 12.5               Pass: 100%/2   | Total: 30m 28s | Avg: 15m 14s | Max: 15m 52s
      🔍 12.6               Pass:  97%/36  | Total:  7h 26m | Avg: 12m 23s | Max: 33m 13s | Hits: 365%/5532  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  5m 46s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 49m 29s | Avg:  9m 53s | Max: 28m 02s | Hits: 365%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 30m 28s | Avg: 15m 14s | Max: 15m 52s
      🔍 nvcc12.6           Pass:  97%/34  | Total:  7h 14m | Avg: 12m 47s | Max: 33m 13s | Hits: 365%/5532  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  5m 46s
      🔍 nvcc               Pass:  97%/41  | Total:  8h 34m | Avg: 12m 33s | Max: 33m 13s | Hits: 365%/7376  
    🔍 cxx: MSVC14.39 🔍
      🟩 Clang14            Pass: 100%/4   | Total: 20m 53s | Avg:  5m 13s | Max:  5m 21s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 25s | Avg:  5m 42s | Max:  6m 09s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 32s | Avg:  5m 46s | Max:  5m 55s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 08s | Avg:  5m 34s | Max:  5m 39s
      🟩 Clang18            Pass: 100%/7   | Total: 55m 23s | Avg:  7m 54s | Max: 20m 23s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 17s | Avg:  5m 38s | Max:  5m 53s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 45s | Avg:  5m 52s | Max:  6m 03s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 43s | Avg:  5m 51s | Max:  5m 52s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 31s | Avg:  5m 45s | Max:  6m 04s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 55s | Avg:  6m 27s | Max:  6m 32s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 56m | Avg: 22m 00s | Max: 33m 13s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 56m 21s | Avg: 28m 10s | Max: 28m 19s | Hits: 365%/3688  
      🔍 MSVC14.39          Pass:  66%/3   | Total:  1h 28m | Avg: 29m 22s | Max: 32m 28s | Hits: 365%/3688  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 30m 28s | Avg: 15m 14s | Max: 15m 52s
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/17  | Total:  1h 50m | Avg:  6m 29s | Max: 20m 23s
      🟩 GCC                Pass: 100%/19  | Total:  4h 00m | Avg: 12m 40s | Max: 33m 13s
      🔍 MSVC               Pass:  80%/5   | Total:  2h 24m | Avg: 28m 53s | Max: 32m 28s | Hits: 365%/7376  
      🟩 NVHPC              Pass: 100%/2   | Total: 30m 28s | Avg: 15m 14s | Max: 15m 52s
    🔍 jobs: TestCPU 🔍
      🟩 Build              Pass: 100%/37  | Total:  7h 09m | Avg: 11m 35s | Max: 33m 13s | Hits: 365%/7376  
      🔍 TestCPU            Pass:  66%/3   | Total: 48m 11s | Avg: 16m 03s | Max: 32m 28s
      🟩 TestGPU            Pass: 100%/3   | Total: 48m 45s | Avg: 16m 15s | Max: 20m 23s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total:  3h 35m | Avg: 10m 46s | Max: 33m 12s | Hits: 365%/5532  
      🔍 20                 Pass:  95%/21  | Total:  4h 31m | Avg: 12m 56s | Max: 33m 13s | Hits: 365%/1844  
    🟨 gpu
      🟨 v100               Pass:  97%/43  | Total:  8h 46m | Avg: 12m 14s | Max: 33m 13s | Hits: 365%/7376  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 38m 50s | Avg: 19m 25s | Max: 25m 56s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 18m 34s | Avg: 18m 34s | Max: 18m 34s
    
  • 🟩 cub: Pass: 100%/44 | Total: 11h 08m | Avg: 15m 11s | Max: 1h 00m | Hits: 540%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total: 10h 03m | Avg: 14m 22s | Max:  1h 00m | Hits: 540%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 05m | Avg: 32m 35s | Max: 59m 23s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 54m 45s | Avg: 10m 57s | Max: 29m 57s | Hits: 540%/888   
      🟩 12.5               Pass: 100%/2   | Total: 19m 45s | Avg:  9m 52s | Max: 10m 01s
      🟩 12.6               Pass: 100%/37  | Total:  9h 54m | Avg: 16m 03s | Max:  1h 00m | Hits: 540%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 44s | Avg:  4m 22s | Max:  4m 30s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 54m 45s | Avg: 10m 57s | Max: 29m 57s | Hits: 540%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 45s | Avg:  9m 52s | Max: 10m 01s
      🟩 nvcc12.6           Pass: 100%/35  | Total:  9h 45m | Avg: 16m 43s | Max:  1h 00m | Hits: 540%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 44s | Avg:  4m 22s | Max:  4m 30s
      🟩 nvcc               Pass: 100%/42  | Total: 10h 59m | Avg: 15m 42s | Max:  1h 00m | Hits: 540%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 24m 51s | Avg:  6m 12s | Max:  6m 27s
      🟩 Clang15            Pass: 100%/2   | Total: 12m 41s | Avg:  6m 20s | Max:  6m 35s
      🟩 Clang16            Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max:  6m 48s
      🟩 Clang17            Pass: 100%/2   | Total: 12m 19s | Avg:  6m 09s | Max:  6m 21s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 14m | Avg: 10m 41s | Max: 26m 07s
      🟩 GCC7               Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 06s
      🟩 GCC8               Pass: 100%/1   | Total:  6m 16s | Avg:  6m 16s | Max:  6m 16s
      🟩 GCC9               Pass: 100%/2   | Total: 13m 33s | Avg:  6m 46s | Max:  7m 00s
      🟩 GCC10              Pass: 100%/2   | Total: 13m 00s | Avg:  6m 30s | Max:  6m 34s
      🟩 GCC11              Pass: 100%/2   | Total: 13m 35s | Avg:  6m 47s | Max:  6m 58s
      🟩 GCC12              Pass: 100%/4   | Total: 37m 35s | Avg:  9m 23s | Max: 19m 04s
      🟩 GCC13              Pass: 100%/8   | Total:  4h 52m | Avg: 36m 36s | Max:  1h 00m
      🟩 MSVC14.29          Pass: 100%/2   | Total: 58m 52s | Avg: 29m 26s | Max: 29m 57s | Hits: 540%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 03m | Avg: 31m 31s | Max: 32m 45s | Hits: 540%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 45s | Avg:  9m 52s | Max: 10m 01s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 18m | Avg:  8m 07s | Max: 26m 07s
      🟩 GCC                Pass: 100%/21  | Total:  6h 28m | Avg: 18m 31s | Max:  1h 00m
      🟩 MSVC               Pass: 100%/4   | Total:  2h 01m | Avg: 30m 28s | Max: 32m 45s | Hits: 540%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 45s | Avg:  9m 52s | Max: 10m 01s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 24m 00s | Avg: 12m 00s | Max: 19m 04s
      🟩 v100               Pass: 100%/42  | Total: 10h 44m | Avg: 15m 20s | Max:  1h 00m | Hits: 540%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  8h 33m | Avg: 13m 51s | Max:  1h 00m | Hits: 540%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 47s | Avg: 21m 47s | Max: 21m 47s
      🟩 GraphCapture       Pass: 100%/1   | Total: 22m 01s | Avg: 22m 01s | Max: 22m 01s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 01m | Avg: 20m 30s | Max: 22m 10s
      🟩 TestGPU            Pass: 100%/2   | Total: 50m 22s | Avg: 25m 11s | Max: 26m 07s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 24m 00s | Avg: 12m 00s | Max: 19m 04s
      🟩 90a                Pass: 100%/1   | Total: 24m 12s | Avg: 24m 12s | Max: 24m 12s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  4h 14m | Avg: 12m 42s | Max:  1h 00m | Hits: 540%/2664  
      🟩 20                 Pass: 100%/24  | Total:  6h 54m | Avg: 17m 16s | Max:  1h 00m | Hits: 540%/888   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 8m 53s | Avg: 4m 26s | Max: 6m 54s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 54s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 54s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 54s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 54s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 54s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 54s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  6m 54s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 59s | Avg:  1m 59s | Max:  1m 59s
      🟩 Test               Pass: 100%/1   | Total:  6m 54s | Avg:  6m 54s | Max:  6m 54s
    
  • 🟩 python: Pass: 100%/1 | Total: 59m 09s | Avg: 59m 09s | Max: 59m 09s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 59m 09s | Avg: 59m 09s | Max: 59m 09s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 59m 09s | Avg: 59m 09s | Max: 59m 09s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 59m 09s | Avg: 59m 09s | Max: 59m 09s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 59m 09s | Avg: 59m 09s | Max: 59m 09s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 59m 09s | Avg: 59m 09s | Max: 59m 09s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 59m 09s | Avg: 59m 09s | Max: 59m 09s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 59m 09s | Avg: 59m 09s | Max: 59m 09s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 59m 09s | Avg: 59m 09s | Max: 59m 09s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

🟨 CI finished in 1d 01h: Pass: 98%/89 | Total: 15h 03m | Avg: 10m 09s | Max: 42m 25s | Hits: 422%/10928
  • 🟨 cub: Pass: 97%/44 | Total: 7h 48m | Avg: 10m 38s | Max: 35m 25s | Hits: 540%/3552

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/42  | Total:  7h 36m | Avg: 10m 51s | Max: 35m 25s | Hits: 540%/3552  
      🟩 arm64              Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  6m 10s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total: 47m 55s | Avg:  9m 35s | Max: 24m 35s | Hits: 540%/888   
      🟩 12.5               Pass: 100%/2   | Total: 19m 20s | Avg:  9m 40s | Max:  9m 57s
      🔍 12.6               Pass:  97%/37  | Total:  6h 40m | Avg: 10m 50s | Max: 35m 25s | Hits: 540%/2664  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 48s | Avg:  4m 24s | Max:  4m 26s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 47m 55s | Avg:  9m 35s | Max: 24m 35s | Hits: 540%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 20s | Avg:  9m 40s | Max:  9m 57s
      🔍 nvcc12.6           Pass:  97%/35  | Total:  6h 32m | Avg: 11m 12s | Max: 35m 25s | Hits: 540%/2664  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 48s | Avg:  4m 24s | Max:  4m 26s
      🔍 nvcc               Pass:  97%/42  | Total:  7h 39m | Avg: 10m 56s | Max: 35m 25s | Hits: 540%/3552  
    🔍 cxx: GCC12 🔍
      🟩 Clang14            Pass: 100%/4   | Total: 23m 52s | Avg:  5m 58s | Max:  6m 33s
      🟩 Clang15            Pass: 100%/2   | Total: 12m 51s | Avg:  6m 25s | Max:  6m 31s
      🟩 Clang16            Pass: 100%/2   | Total: 12m 27s | Avg:  6m 13s | Max:  6m 24s
      🟩 Clang17            Pass: 100%/2   | Total: 12m 13s | Avg:  6m 06s | Max:  6m 11s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 21m | Avg: 11m 34s | Max: 30m 04s
      🟩 GCC7               Pass: 100%/2   | Total: 12m 53s | Avg:  6m 26s | Max:  6m 33s
      🟩 GCC8               Pass: 100%/1   | Total:  6m 00s | Avg:  6m 00s | Max:  6m 00s
      🟩 GCC9               Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 13s
      🟩 GCC10              Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 29s
      🟩 GCC11              Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 31s
      🔍 GCC12              Pass:  75%/4   | Total: 17m 33s | Avg:  4m 23s | Max:  6m 37s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 56m | Avg: 14m 33s | Max: 35m 25s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 55m 32s | Avg: 27m 46s | Max: 30m 57s | Hits: 540%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 00m | Avg: 30m 13s | Max: 30m 46s | Hits: 540%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 20s | Avg:  9m 40s | Max:  9m 57s
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/17  | Total:  2h 22m | Avg:  8m 22s | Max: 30m 04s
      🔍 GCC                Pass:  95%/21  | Total:  3h 10m | Avg:  9m 04s | Max: 35m 25s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 55m | Avg: 28m 59s | Max: 30m 57s | Hits: 540%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 20s | Avg:  9m 40s | Max:  9m 57s
    🔍 gpu: h100 🔍
      🔍 h100               Pass:  50%/2   | Total:  4m 33s | Avg:  2m 16s | Max:  4m 33s
      🟩 v100               Pass: 100%/42  | Total:  7h 43m | Avg: 11m 02s | Max: 35m 25s | Hits: 540%/3552  
    🔍 jobs: HostLaunch 🔍
      🟩 Build              Pass: 100%/37  | Total:  5h 22m | Avg:  8m 43s | Max: 30m 57s | Hits: 540%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 18m 29s | Avg: 18m 29s | Max: 18m 29s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s
      🔍 HostLaunch         Pass:  66%/3   | Total: 47m 13s | Avg: 15m 44s | Max: 23m 38s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 44s | Max: 35m 25s
    🔍 sm: 90 🔍
      🔍 90                 Pass:  50%/2   | Total:  4m 33s | Avg:  2m 16s | Max:  4m 33s
      🟩 90a                Pass: 100%/1   | Total:  4m 51s | Avg:  4m 51s | Max:  4m 51s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total:  3h 13m | Avg:  9m 39s | Max: 30m 57s | Hits: 540%/2664  
      🔍 20                 Pass:  95%/24  | Total:  4h 34m | Avg: 11m 27s | Max: 35m 25s | Hits: 540%/888   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 6h 22m | Avg: 9m 05s | Max: 30m 05s | Hits: 365%/7376

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 22m 02s | Avg: 11m 01s | Max: 16m 21s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total:  6h 12m | Avg:  9m 18s | Max: 30m 05s | Hits: 365%/7376  
      🟩 arm64              Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 10s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 44m 43s | Avg:  8m 56s | Max: 24m 20s | Hits: 365%/1844  
      🟩 12.5               Pass: 100%/2   | Total: 29m 25s | Avg: 14m 42s | Max: 14m 54s
      🟩 12.6               Pass: 100%/35  | Total:  5h 07m | Avg:  8m 47s | Max: 30m 05s | Hits: 365%/5532  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 31s | Avg:  5m 15s | Max:  5m 18s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 44m 43s | Avg:  8m 56s | Max: 24m 20s | Hits: 365%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 29m 25s | Avg: 14m 42s | Max: 14m 54s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  4h 57m | Avg:  9m 00s | Max: 30m 05s | Hits: 365%/5532  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 31s | Avg:  5m 15s | Max:  5m 18s
      🟩 nvcc               Pass: 100%/40  | Total:  6h 11m | Avg:  9m 17s | Max: 30m 05s | Hits: 365%/7376  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 10s | Avg:  5m 17s | Max:  5m 36s
      🟩 Clang15            Pass: 100%/2   | Total: 12m 23s | Avg:  6m 11s | Max:  6m 18s
      🟩 Clang16            Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  6m 04s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  5m 43s
      🟩 Clang18            Pass: 100%/7   | Total: 49m 45s | Avg:  7m 06s | Max: 15m 37s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 59s | Avg:  5m 29s | Max:  5m 40s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 16s | Avg:  5m 16s | Max:  5m 16s
      🟩 GCC9               Pass: 100%/2   | Total: 10m 42s | Avg:  5m 21s | Max:  5m 39s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 01s | Avg:  5m 30s | Max:  5m 32s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 30s | Avg:  5m 45s | Max:  5m 48s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 39s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 13m | Avg:  9m 10s | Max: 21m 47s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 52m 44s | Avg: 26m 22s | Max: 28m 24s | Hits: 365%/3688  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 58m 18s | Avg: 29m 09s | Max: 30m 05s | Hits: 365%/3688  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 29m 25s | Avg: 14m 42s | Max: 14m 54s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 46m | Avg:  6m 15s | Max: 15m 37s
      🟩 GCC                Pass: 100%/19  | Total:  2h 15m | Avg:  7m 07s | Max: 21m 47s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 51m | Avg: 27m 45s | Max: 30m 05s | Hits: 365%/7376  
      🟩 NVHPC              Pass: 100%/2   | Total: 29m 25s | Avg: 14m 42s | Max: 14m 54s
    🟩 gpu
      🟩 v100               Pass: 100%/42  | Total:  6h 22m | Avg:  9m 05s | Max: 30m 05s | Hits: 365%/7376  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 12m | Avg:  8m 26s | Max: 30m 05s | Hits: 365%/7376  
      🟩 TestCPU            Pass: 100%/2   | Total: 16m 04s | Avg:  8m 02s | Max:  8m 09s
      🟩 TestGPU            Pass: 100%/3   | Total: 53m 45s | Avg: 17m 55s | Max: 21m 47s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 37s | Avg:  4m 37s | Max:  4m 37s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 05m | Avg:  9m 15s | Max: 28m 24s | Hits: 365%/5532  
      🟩 20                 Pass: 100%/20  | Total:  2h 55m | Avg:  8m 45s | Max: 30m 05s | Hits: 365%/1844  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 05s | Avg: 5m 32s | Max: 9m 00s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  9m 00s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  9m 00s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  9m 00s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  9m 00s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  9m 00s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  9m 00s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  9m 00s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 05s | Avg:  2m 05s | Max:  2m 05s
      🟩 Test               Pass: 100%/1   | Total:  9m 00s | Avg:  9m 00s | Max:  9m 00s
    
  • 🟩 python: Pass: 100%/1 | Total: 42m 25s | Avg: 42m 25s | Max: 42m 25s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 42m 25s | Avg: 42m 25s | Max: 42m 25s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 42m 25s | Avg: 42m 25s | Max: 42m 25s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 42m 25s | Avg: 42m 25s | Max: 42m 25s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 42m 25s | Avg: 42m 25s | Max: 42m 25s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 42m 25s | Avg: 42m 25s | Max: 42m 25s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 42m 25s | Avg: 42m 25s | Max: 42m 25s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 42m 25s | Avg: 42m 25s | Max: 42m 25s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 42m 25s | Avg: 42m 25s | Max: 42m 25s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
8 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

🟩 CI finished in 4h 15m: Pass: 100%/89 | Total: 1d 16h | Avg: 27m 07s | Max: 1h 15m | Hits: 236%/10936
  • 🟩 cub: Pass: 100%/44 | Total: 1d 00h | Avg: 33m 34s | Max: 1h 15m | Hits: 277%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total: 23h 26m | Avg: 33m 29s | Max:  1h 15m | Hits: 277%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 21s | Max:  1h 04m
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 10m | Avg: 38m 02s | Max:  1h 02m | Hits: 250%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 27m | Avg:  1h 13m | Max:  1h 13m
      🟩 12.6               Pass: 100%/37  | Total: 18h 59m | Avg: 30m 48s | Max:  1h 15m | Hits: 285%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 03s | Avg:  4m 31s | Max:  4m 34s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 10m | Avg: 38m 02s | Max:  1h 02m | Hits: 250%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 27m | Avg:  1h 13m | Max:  1h 13m
      🟩 nvcc12.6           Pass: 100%/35  | Total: 18h 50m | Avg: 32m 18s | Max:  1h 15m | Hits: 285%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 03s | Avg:  4m 31s | Max:  4m 34s
      🟩 nvcc               Pass: 100%/42  | Total:  1d 00h | Avg: 34m 57s | Max:  1h 15m | Hits: 277%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 44m | Avg: 56m 13s | Max: 59m 49s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 47m | Avg: 53m 59s | Max: 54m 10s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 55m | Avg: 57m 43s | Max: 59m 11s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 46m | Avg: 53m 11s | Max: 53m 55s
      🟩 Clang18            Pass: 100%/7   | Total:  3h 54m | Avg: 33m 26s | Max:  1h 04m
      🟩 GCC7               Pass: 100%/2   | Total: 12m 43s | Avg:  6m 21s | Max:  6m 23s
      🟩 GCC8               Pass: 100%/1   | Total:  6m 35s | Avg:  6m 35s | Max:  6m 35s
      🟩 GCC9               Pass: 100%/2   | Total: 12m 47s | Avg:  6m 23s | Max:  6m 26s
      🟩 GCC10              Pass: 100%/2   | Total: 14m 07s | Avg:  7m 03s | Max:  7m 47s
      🟩 GCC11              Pass: 100%/2   | Total: 13m 11s | Avg:  6m 35s | Max:  6m 46s
      🟩 GCC12              Pass: 100%/4   | Total: 37m 59s | Avg:  9m 29s | Max: 19m 04s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 43m | Avg: 20m 25s | Max: 46m 19s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 13m | Hits: 268%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 15m | Hits: 285%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 27m | Avg:  1h 13m | Max:  1h 13m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 13h 08m | Avg: 46m 23s | Max:  1h 04m
      🟩 GCC                Pass: 100%/21  | Total:  4h 20m | Avg: 12m 25s | Max: 46m 19s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 40m | Avg:  1h 10m | Max:  1h 15m | Hits: 277%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 27m | Avg:  1h 13m | Max:  1h 13m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 24m 03s | Avg: 12m 01s | Max: 19m 04s
      🟩 v100               Pass: 100%/42  | Total:  1d 00h | Avg: 34m 36s | Max:  1h 15m | Hits: 277%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 21h 13m | Avg: 34m 25s | Max:  1h 15m | Hits: 277%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 39m 19s | Avg: 39m 19s | Max: 39m 19s
      🟩 GraphCapture       Pass: 100%/1   | Total: 29m 03s | Avg: 29m 03s | Max: 29m 03s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 09m | Avg: 23m 11s | Max: 26m 01s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 58s | Max: 46m 19s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 24m 03s | Avg: 12m 01s | Max: 19m 04s
      🟩 90a                Pass: 100%/1   | Total:  5m 23s | Avg:  5m 23s | Max:  5m 23s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 11h 15m | Avg: 33m 45s | Max:  1h 13m | Hits: 269%/2664  
      🟩 20                 Pass: 100%/24  | Total: 13h 22m | Avg: 33m 25s | Max:  1h 15m | Hits: 298%/888   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 14h 38m | Avg: 20m 55s | Max: 1h 04m | Hits: 216%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 02s | Avg:  8m 31s | Max: 11m 28s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total: 14h 06m | Avg: 21m 09s | Max:  1h 04m | Hits: 216%/7384  
      🟩 arm64              Pass: 100%/2   | Total: 32m 12s | Avg: 16m 06s | Max: 27m 02s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  2h 00m | Avg: 24m 10s | Max: 46m 50s | Hits: 256%/1846  
      🟩 12.5               Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 04m
      🟩 12.6               Pass: 100%/35  | Total: 10h 31m | Avg: 18m 02s | Max: 54m 37s | Hits: 202%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 44s | Avg:  5m 22s | Max:  5m 41s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 00m | Avg: 24m 10s | Max: 46m 50s | Hits: 256%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 04m
      🟩 nvcc12.6           Pass: 100%/33  | Total: 10h 20m | Avg: 18m 48s | Max: 54m 37s | Hits: 202%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 44s | Avg:  5m 22s | Max:  5m 41s
      🟩 nvcc               Pass: 100%/40  | Total: 14h 27m | Avg: 21m 41s | Max:  1h 04m | Hits: 216%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 23s | Max: 33m 19s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 02m | Avg: 31m 22s | Max: 31m 39s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 04m | Avg: 32m 06s | Max: 32m 09s
      🟩 Clang17            Pass: 100%/2   | Total: 58m 44s | Avg: 29m 22s | Max: 30m 08s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 56m | Avg: 16m 41s | Max: 31m 02s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 55s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 31s | Avg:  5m 31s | Max:  5m 31s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 33s | Avg:  5m 46s | Max:  6m 02s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 57s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  6m 10s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 35s | Avg:  6m 17s | Max:  6m 31s
      🟩 GCC13              Pass: 100%/8   | Total: 58m 34s | Avg:  7m 19s | Max: 11m 30s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 36m | Avg: 48m 19s | Max: 49m 49s | Hits: 228%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 44m | Avg: 52m 21s | Max: 54m 37s | Hits: 203%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 04m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 08m | Avg: 25m 10s | Max: 33m 19s
      🟩 GCC                Pass: 100%/19  | Total:  2h 02m | Avg:  6m 28s | Max: 11m 30s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 21m | Avg: 50m 20s | Max: 54m 37s | Hits: 216%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 04m
    🟩 gpu
      🟩 v100               Pass: 100%/42  | Total: 14h 38m | Avg: 20m 55s | Max:  1h 04m | Hits: 216%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 13h 48m | Avg: 22m 23s | Max:  1h 04m | Hits: 216%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 16m 07s | Avg:  8m 03s | Max:  8m 11s
      🟩 TestGPU            Pass: 100%/3   | Total: 33m 55s | Avg: 11m 18s | Max: 11m 30s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 37s | Avg:  4m 37s | Max:  4m 37s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  7h 31m | Avg: 22m 35s | Max:  1h 04m | Hits: 215%/5538  
      🟩 20                 Pass: 100%/20  | Total:  6h 49m | Avg: 20m 29s | Max:  1h 01m | Hits: 217%/1846  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 55s | Avg: 5m 27s | Max: 8m 48s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  8m 48s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  8m 48s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  8m 48s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  8m 48s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  8m 48s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  8m 48s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 55s | Avg:  5m 27s | Max:  8m 48s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 07s | Avg:  2m 07s | Max:  2m 07s
      🟩 Test               Pass: 100%/1   | Total:  8m 48s | Avg:  8m 48s | Max:  8m 48s
    
  • 🟩 python: Pass: 100%/1 | Total: 47m 36s | Avg: 47m 36s | Max: 47m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 47m 36s | Avg: 47m 36s | Max: 47m 36s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 47m 36s | Avg: 47m 36s | Max: 47m 36s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 47m 36s | Avg: 47m 36s | Max: 47m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 47m 36s | Avg: 47m 36s | Max: 47m 36s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 47m 36s | Avg: 47m 36s | Max: 47m 36s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 47m 36s | Avg: 47m 36s | Max: 47m 36s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 47m 36s | Avg: 47m 36s | Max: 47m 36s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 47m 36s | Avg: 47m 36s | Max: 47m 36s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
8 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1

Copy link
Contributor

🟩 CI finished in 1h 57m: Pass: 100%/89 | Total: 16h 36m | Avg: 11m 11s | Max: 1h 48m | Hits: 422%/10936
  • 🟩 cub: Pass: 100%/44 | Total: 9h 22m | Avg: 12m 47s | Max: 1h 48m | Hits: 540%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  9h 10m | Avg: 13m 06s | Max:  1h 48m | Hits: 540%/3552  
      🟩 arm64              Pass: 100%/2   | Total: 11m 54s | Avg:  5m 57s | Max:  6m 09s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 51m 10s | Avg: 10m 14s | Max: 27m 02s | Hits: 540%/888   
      🟩 12.5               Pass: 100%/2   | Total: 19m 27s | Avg:  9m 43s | Max:  9m 50s
      🟩 12.6               Pass: 100%/37  | Total:  8h 11m | Avg: 13m 17s | Max:  1h 48m | Hits: 540%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 52s | Avg:  4m 26s | Max:  4m 37s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 51m 10s | Avg: 10m 14s | Max: 27m 02s | Hits: 540%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 27s | Avg:  9m 43s | Max:  9m 50s
      🟩 nvcc12.6           Pass: 100%/35  | Total:  8h 03m | Avg: 13m 48s | Max:  1h 48m | Hits: 540%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 52s | Avg:  4m 26s | Max:  4m 37s
      🟩 nvcc               Pass: 100%/42  | Total:  9h 13m | Avg: 13m 10s | Max:  1h 48m | Hits: 540%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 24m 21s | Avg:  6m 05s | Max:  6m 29s
      🟩 Clang15            Pass: 100%/2   | Total: 13m 20s | Avg:  6m 40s | Max:  6m 47s
      🟩 Clang16            Pass: 100%/2   | Total: 12m 27s | Avg:  6m 13s | Max:  6m 15s
      🟩 Clang17            Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 16s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 11m | Avg: 10m 12s | Max: 24m 07s
      🟩 GCC7               Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 29s
      🟩 GCC8               Pass: 100%/1   | Total:  6m 32s | Avg:  6m 32s | Max:  6m 32s
      🟩 GCC9               Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 30s
      🟩 GCC10              Pass: 100%/2   | Total: 13m 24s | Avg:  6m 42s | Max:  6m 50s
      🟩 GCC11              Pass: 100%/2   | Total: 13m 04s | Avg:  6m 32s | Max:  6m 35s
      🟩 GCC12              Pass: 100%/4   | Total: 45m 56s | Avg: 11m 29s | Max: 27m 47s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 11m | Avg: 23m 53s | Max:  1h 48m
      🟩 MSVC14.29          Pass: 100%/2   | Total: 56m 45s | Avg: 28m 22s | Max: 29m 43s | Hits: 540%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 56m 49s | Avg: 28m 24s | Max: 29m 31s | Hits: 540%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 27s | Avg:  9m 43s | Max:  9m 50s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 13m | Avg:  7m 52s | Max: 24m 07s
      🟩 GCC                Pass: 100%/21  | Total:  4h 55m | Avg: 14m 04s | Max:  1h 48m
      🟩 MSVC               Pass: 100%/4   | Total:  1h 53m | Avg: 28m 23s | Max: 29m 43s | Hits: 540%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 27s | Avg:  9m 43s | Max:  9m 50s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 32m 46s | Avg: 16m 23s | Max: 27m 47s
      🟩 rtxa6000           Pass: 100%/8   | Total:  3h 43m | Avg: 27m 57s | Max:  1h 48m
      🟩 v100               Pass: 100%/34  | Total:  5h 06m | Avg:  9m 00s | Max: 29m 43s | Hits: 540%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 23m | Avg:  8m 44s | Max: 29m 43s | Hits: 540%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total:  1h 48m | Avg:  1h 48m | Max:  1h 48m
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 35s | Avg: 14m 35s | Max: 14m 35s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 16m | Avg: 25m 35s | Max: 27m 47s
      🟩 TestGPU            Pass: 100%/2   | Total: 39m 32s | Avg: 19m 46s | Max: 20m 02s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 32m 46s | Avg: 16m 23s | Max: 27m 47s
      🟩 90a                Pass: 100%/1   | Total:  5m 03s | Avg:  5m 03s | Max:  5m 03s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 15m | Avg:  9m 45s | Max: 29m 43s | Hits: 540%/2664  
      🟩 20                 Pass: 100%/24  | Total:  6h 07m | Avg: 15m 18s | Max:  1h 48m | Hits: 540%/888   
    
  • 🟩 thrust: Pass: 100%/42 | Total: 6h 37m | Avg: 9m 27s | Max: 35m 57s | Hits: 365%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 44s | Avg:  8m 22s | Max: 11m 06s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total:  5h 56m | Avg:  8m 55s | Max: 29m 47s | Hits: 365%/7384  
      🟩 arm64              Pass: 100%/2   | Total: 40m 30s | Avg: 20m 15s | Max: 35m 57s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 43m 29s | Avg:  8m 41s | Max: 22m 57s | Hits: 365%/1846  
      🟩 12.5               Pass: 100%/2   | Total: 30m 36s | Avg: 15m 18s | Max: 16m 05s
      🟩 12.6               Pass: 100%/35  | Total:  5h 23m | Avg:  9m 14s | Max: 35m 57s | Hits: 365%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 11m 56s | Avg:  5m 58s | Max:  6m 35s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 43m 29s | Avg:  8m 41s | Max: 22m 57s | Hits: 365%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 30m 36s | Avg: 15m 18s | Max: 16m 05s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  5h 11m | Avg:  9m 26s | Max: 35m 57s | Hits: 365%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 11m 56s | Avg:  5m 58s | Max:  6m 35s
      🟩 nvcc               Pass: 100%/40  | Total:  6h 25m | Avg:  9m 38s | Max: 35m 57s | Hits: 365%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 47s | Avg:  5m 26s | Max:  5m 51s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 58s | Avg:  5m 59s | Max:  6m 02s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 33s | Avg:  5m 46s | Max:  5m 49s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 58s
      🟩 Clang18            Pass: 100%/7   | Total: 45m 57s | Avg:  6m 33s | Max: 10m 27s
      🟩 GCC7               Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max:  8m 29s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 34s | Avg:  5m 47s | Max:  6m 11s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 24s | Avg:  5m 42s | Max:  5m 46s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 31s | Avg:  5m 45s | Max:  6m 00s
      🟩 GCC12              Pass: 100%/2   | Total: 11m 44s | Avg:  5m 52s | Max:  5m 55s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 28m | Avg: 11m 06s | Max: 35m 57s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 51m 23s | Avg: 25m 41s | Max: 28m 26s | Hits: 365%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 58m 14s | Avg: 29m 07s | Max: 29m 47s | Hits: 365%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 30m 36s | Avg: 15m 18s | Max: 16m 05s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 42m | Avg:  6m 02s | Max: 10m 27s
      🟩 GCC                Pass: 100%/19  | Total:  2h 34m | Avg:  8m 07s | Max: 35m 57s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 49m | Avg: 27m 24s | Max: 29m 47s | Hits: 365%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total: 30m 36s | Avg: 15m 18s | Max: 16m 05s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  1h 05m | Avg:  8m 12s | Max: 11m 25s
      🟩 v100               Pass: 100%/34  | Total:  5h 31m | Avg:  9m 45s | Max: 35m 57s | Hits: 365%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 48m | Avg:  9m 25s | Max: 35m 57s | Hits: 365%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 42s | Avg:  7m 51s | Max:  8m 09s
      🟩 TestGPU            Pass: 100%/3   | Total: 32m 58s | Avg: 10m 59s | Max: 11m 25s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 43s | Avg:  4m 43s | Max:  4m 43s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 10m | Avg:  9m 30s | Max: 28m 27s | Hits: 365%/5538  
      🟩 20                 Pass: 100%/20  | Total:  3h 10m | Avg:  9m 31s | Max: 35m 57s | Hits: 365%/1846  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 23s | Avg: 5m 41s | Max: 9m 08s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  9m 08s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  9m 08s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  9m 08s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  9m 08s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  9m 08s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  9m 08s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 11m 23s | Avg:  5m 41s | Max:  9m 08s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 15s | Avg:  2m 15s | Max:  2m 15s
      🟩 Test               Pass: 100%/1   | Total:  9m 08s | Avg:  9m 08s | Max:  9m 08s
    
  • 🟩 python: Pass: 100%/1 | Total: 24m 53s | Avg: 24m 53s | Max: 24m 53s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 24m 53s | Avg: 24m 53s | Max: 24m 53s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 24m 53s | Avg: 24m 53s | Max: 24m 53s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 24m 53s | Avg: 24m 53s | Max: 24m 53s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 24m 53s | Avg: 24m 53s | Max: 24m 53s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 24m 53s | Avg: 24m 53s | Max: 24m 53s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 24m 53s | Avg: 24m 53s | Max: 24m 53s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 24m 53s | Avg: 24m 53s | Max: 24m 53s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 24m 53s | Avg: 24m 53s | Max: 24m 53s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 89)

# Runner
65 linux-amd64-cpu16
8 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

Comment on lines +72 to +108
const auto num_items_lhs = elements / 2;
const auto num_items_rhs = elements - num_items_lhs;
auto counting_it = thrust::make_counting_iterator(offset_t{0});
thrust::copy_if(
counting_it,
counting_it + elements,
rnd_selector_val.begin(),
thrust::make_tabulate_output_iterator(write_pivot_point_t<offset_t>{
static_cast<offset_t>(num_items_lhs), thrust::raw_pointer_cast(pivot_point.data())}),
select_lhs_op);

thrust::device_vector<key_t> keys_lhs(num_items_lhs);
thrust::device_vector<key_t> keys_rhs(num_items_rhs);
thrust::device_vector<key_t> keys_out(elements);

// Generate increasing input range to sample from
thrust::device_vector<key_t> increasing_input = generate(elements);
thrust::sort(increasing_input.begin(), increasing_input.end());

// Select lhs from input up to pivot point
offset_t pivot_point_val = pivot_point[0];
auto const end_lhs = thrust::copy_if(
increasing_input.cbegin(),
increasing_input.cbegin() + pivot_point_val,
rnd_selector_val.cbegin(),
keys_lhs.begin(),
select_lhs_op);

// Select rhs items from input up to pivot point
auto const end_rhs = thrust::copy_if(
increasing_input.cbegin(),
increasing_input.cbegin() + pivot_point_val,
rnd_selector_val.cbegin(),
keys_rhs.begin(),
select_rhs_op);
// From pivot point copy all remaining items to rhs
thrust::copy(increasing_input.cbegin() + pivot_point_val, increasing_input.cbegin() + elements, end_rhs);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remark: this is beautiful 💚

Comment on lines +45 to +111
// Retrieve axis parameters
const auto elements = static_cast<std::size_t>(state.get_int64("Elements{io}"));
const bit_entropy entropy = str_to_entropy(state.get_string("Entropy"));

// We generate data distributions in the range [0, 255] that, with lower entropy, get skewed towards 0.
// We use this to generate increasingly large *consecutive* segments of data that are getting selected from the lhs
thrust::device_vector<uint8_t> rnd_selector_val = generate(elements, entropy);
uint8_t threshold = 128;
select_if_less_than_t select_lhs_op{false, threshold};
select_if_less_than_t select_rhs_op{true, threshold};

// The following algorithm only works under the precondition that there's at least 50% of the data in the lhs
// If that's not the case, we simply swap the logic for selecting into lhs and rhs
const auto num_items_selected_into_lhs =
static_cast<offset_t>(thrust::count_if(rnd_selector_val.begin(), rnd_selector_val.end(), select_lhs_op));
if (num_items_selected_into_lhs < elements / 2)
{
using ::cuda::std::swap;
swap(select_lhs_op, select_rhs_op);
}

// We want lhs and rhs to be of equal size. We also want to have skewed distributions, such that we put different
// workloads on the binary search part. For this reason, we identify the index from the input, referred to as pivot
// point, after which the lhs is "full". We compose the rhs by selecting all items up to the pivot point that were not
// selected for lhs and *all* items after the pivot point.
constexpr std::size_t num_pivot_points = 1;
thrust::device_vector<offset_t> pivot_point(num_pivot_points);
const auto num_items_lhs = elements / 2;
const auto num_items_rhs = elements - num_items_lhs;
auto counting_it = thrust::make_counting_iterator(offset_t{0});
thrust::copy_if(
counting_it,
counting_it + elements,
rnd_selector_val.begin(),
thrust::make_tabulate_output_iterator(write_pivot_point_t<offset_t>{
static_cast<offset_t>(num_items_lhs), thrust::raw_pointer_cast(pivot_point.data())}),
select_lhs_op);

thrust::device_vector<key_t> keys_lhs(num_items_lhs);
thrust::device_vector<key_t> keys_rhs(num_items_rhs);
thrust::device_vector<key_t> keys_out(elements);
thrust::device_vector<value_t> values_lhs(num_items_lhs);
thrust::device_vector<value_t> values_rhs(num_items_rhs);
thrust::device_vector<value_t> values_out(elements);

// Generate increasing input range to sample from
thrust::device_vector<key_t> increasing_input = generate(elements);
thrust::sort(increasing_input.begin(), increasing_input.end());

// Select lhs from input up to pivot point
offset_t pivot_point_val = pivot_point[0];
auto const end_lhs = thrust::copy_if(
increasing_input.cbegin(),
increasing_input.cbegin() + pivot_point_val,
rnd_selector_val.cbegin(),
keys_lhs.begin(),
select_lhs_op);

// Select rhs items from input up to pivot point
auto const end_rhs = thrust::copy_if(
increasing_input.cbegin(),
increasing_input.cbegin() + pivot_point_val,
rnd_selector_val.cbegin(),
keys_rhs.begin(),
select_rhs_op);
// From pivot point copy all remaining items to rhs
thrust::copy(increasing_input.cbegin() + pivot_point_val, increasing_input.cbegin() + elements, end_rhs);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: this looks like a duplicate of workload generation you added in the keys version. Maybe we could simplify the code a bit by having cuda::std::pair<thrust::device_vector<T>, thrust::device_vector<T>> generate(elements, entropy); auto [keys_lhs, keys_rhs] = ... function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'll do that in a follow-up PR 👍

@elstehle elstehle merged commit 38ac752 into NVIDIA:main Feb 1, 2025
100 of 104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add benchmarks for cub::DeviceMerge
3 participants